Text Embeddings by Weakly-Supervised Contrastive Pre-training
Abstract
The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs).
Figure 1
Appendix A
2 Related Works
Most closely related to our work is a series of community efforts by sentence-transformers to train embeddings with a collection of labeled and automatically collected datasets. In this paper, we show that it is possible to train high-quality embeddings using self-supervised pre-training only.
「自己教師あり事前訓練のみで高品質なembeddingを訓練できることをこの論文で示す」